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Abstract 

We study the problem of communicating a distributed correlated memoryless source over a memoryless network, from source 
nodes to destination nodes, under quadratic distortion constraints. We establish the following two complementary results: (a) 
for an arbitrary memoryless network, among all distributed memoryless sources of a given correlation, Gaussian sources are 
least compressible, that is, they admit the smallest set of achievable distortion tuples, and (b) for any memoryless source to 
be communicated over a memoryless additive-noise network, among all noise processes of a given correlation, Gaussian noise 
admits the smallest achievable set of distortion tuples. We establish these results constructively by showing how schemes for the 
corresponding Gaussian problems can be applied to achieve similar performance for (source or noise) distributions that are not 
necessarily Gaussian but have the same covariance. 



Q I. INTRODUCTION 

Stochastic modeling of the data source and the communication medium are essential in data compression and data 
communication problems. However, extracting these descriptions from a practical system is in general difficult and often 
leads to intractable problems from a theoretical point of view. As a result, Gaussian models for both the data sources and the 
noise in communication networks prevail. 

The modeling of the noise in communication links as additive Gaussian is generally justified through the Central Limit 
Theorem, which suggests that the cumulative effect of many independent noise sources should be approximately Gaussian. 
r | The modeling of data sources as Gaussian, on the other hand, is less justifiable and done largely for the sake of analytical 
HH tractability. 

From a theoretical standpoint, one way of supporting the Gaussian assumption is by establishing that it is worst-case, meaning 
{ that, within a given family of distributions (usually defined by a covariance constraint), the Gaussian assumption results in the 
smallest possible capacity or rate-distortion region. In fact, this has long been known to be the case in two classical single-user 
t-H Information Theory scenarios. In the channel coding setting, it is known that, given a fixed variance of the noise, the Gaussian 
distribution minimizes the capacity of a memoryless additive-noise channel. The source coding counterpart of this result is 
that, for a fixed-variance i.i.d. random source, the Gaussian distribution minimizes the rate-distortion region. Both of these 
assertions can be proved using the fact that, subject to a variance constraint, the Gaussian distribution maximizes the entropy. 
In the channel coding case, a more operational proof of the fact that Gaussian noise is the worst-case noise was provided in [ 1 1, 
where it was shown that random Gaussian codebooks and nearest-neighbor decoding achieve the capacity of the corresponding 
AWGN channel on a non-Gaussian channel. 



o 

There are a few other worst-case characterizations in the literature. One example is [2|, where the authors consider vector 

t-H channels with additive noise subject to the constraint that the noise covariance matrix lies in a convex set. It is shown that, in 

£>. this setting, the worst-case noise is vector Gaussian with a covariance matrix that depends on the transmit power constraints. 

• '-J In U, a scalar additive-noise channel with binary input is considered. In this setting, the probability mass function of the 

rS (discrete) worst-case noise is characterized, and the worst-case capacity (i.e., the capacity under the worst-case noise) is found. 

Another example is the work of [4| that characterizes the rate-distortion region for the two-encoder source coding problem with 

quadratic distortion constraints and Gaussian sources, which in turn allows the characterization of the joint Gaussian source 

as the worst-case source for the two-encoder quadratic source coding problem. 

Beyond the aforementioned examples, worst-case analysis of more general multi-user networks was, until recently, fairly 
limited. The main challenge lay in the fact that most multi-user Information Theory problems remain unsolved, i.e., without an 
explicit characterization of the capacity or rate-distortion regions. Recently, a new approach was introduced in |5| that allowed 
to generalize the worst-case noise result from additive-noise point-to-point channels to arbitrary linear additive-noise wireless 
networks^ The framework in O can be described in two main steps. First, a DFT (Discrete Fourier Transform)-based linear 
transformation is applied to all transmitted and received signals in the network in order to create an effective network where 
the additive-noise terms are "approximately Gaussian']^] Next, by demonstrating the optimality of coding schemes with finite 
precisiorj^j in Gaussian networks, it is proven that the capacity region of the Gaussian network is contained in the capacity 
region of the effective network asymptotically (as the size of the blocks to which we apply the DFT-based transformation 
increases). This approach was later utilized in [6| to establish that Gaussian sources are worst-case data sources for distributed 

A shorter version of this paper was submitted to the International Symposium on Information Theory 2013 

1 In these networks, the received signal at each node is a linear combination of the transmit signals at all other nodes plus a noise term. 
2 In the sense that their distribution converges to a Gaussian distribution as the size of the blocks to which we apply the DFT-based transformation increases. 
3 In coding schemes with finite precision, the encoding and decoding operations of the nodes in the network may only take inputs with a finite decimal 
expansion. This precision can become arbitrary large as the coding block length increases. 



compression of correlated sources over rate-constrained, noiseless channels, with a quadratic distortion measure (i.e., in the 
context of the quadratic fc-encoder source coding problem). 

In this work, we pursue the analogue of these worst-case results in joint source-channel coding, by considering the problem 
of distributed compression of information over an arbitrary network. More precisely, k nodes in the network have access to 
correlated stochastic sources and wish to transmit them over an A-node network to respective destinations. A coding scheme 
is employed to define the encoding, relaying and decoding operations of the network nodes, and its performance metric is the 
mean square error in the destinations' reconstruction of their desired sources. This problem lies at the heart of increasingly 
many applications concerning distributed compression of information over a network, such as sensor networks. 

Since this setup involves the modeling of both the sources and the network, the worst-case characterization takes the form 
of two related sub-questions: 

• Question 1: Given an arbitrary memoryless network, for a fixed correlation amongst its distributed memoryless 
components, are the jointly Gaussian sources, the worst compressible? In other words, do they have the smallest set 
of achievable distortion tuples? 

• Question 2: Given an arbitrary memoryless distributed source, for an additive-noise network with a given noise correlation, 
is the Gaussian noise worst-case, in the sense of having the smallest set of achievable distortion tuples? 

In this paper, we answer both of these questions in the affirmative. We utilize the aforementioned framework to propose a 
universal way of converting a coding scheme designed under the Gaussian assumption into coding schemes that can handle 
and attain similar performances for non-Gaussian sources or noises. In particular, we start by using the DFT-based linear 
transformation as a way to make either the sources or the noises approximately Gaussian. Since this operation introduces a 
statistical dependence between the resulting sources or noises, an interleaving scheme is employed, in order to create blocks of 
i.i.d. approximately Gaussian sources and noises. Within each of the resulting blocks we then apply the original coding scheme 
designed under Gaussian models. We show that such a scheme, when performed over sufficiently long blocks, can achieve 
distortions arbitrarily close to those achieved by the original coding scheme designed for Gaussian sources or noises. This is 
done by showing that our original scheme can be assumed without loss of generality to satisfy two properties: finite precision 3 
and bounded outputs^] These properties allow us to use standard tools regarding the convergence of random variables, such as 
the Dominated Convergence Theorem, to bound the distortion attained by the new coding scheme constructed based on the 
DFT-based linear transformation. 

Our contribution lies not only in answering the above two questions in the affirmative and showing the worst-case nature 
of Gaussian assumptions, but also in describing a systematic way of converting coding schemes designed under Gaussian 
assumptions into coding schemes that can handle non-Gaussian assumptions. The idea behind the construction of such schemes 
is simple conceptually, using DFT-based linear transformations, which renders them also algorithmically tractable. 

The rest of the paper is organized as follows. Section llllpresents the formal problem formulation along with the main results 
of the paper. An overview of the main ingredients used in the proofs of the main results is provided in Section [Til] Section [TV] 
studies the problem of finding the worst case source, given a fixed correlation matrix, for compression over a given memoryless 
network while the worst case nature of Gaussian additive noise, again for a fixed correlation structure, is proven for an arbitrary 
memoryless distributed source in Section W\ The paper is concluded in Section VI 



II. Problem Formulation and Main Results 

We are given a (stochastic) network, where source nodes want to communicate correlated memoryless sources across 
the network to respective destination nodes, subject to a distortion constraint. As outlined in Section II] we address two 
complementary questions in this paper. First, we consider characterizing, for a fixed network, the worst-case source distribution; 
i.e., the source distribution for which the set of achievable distortion constraints is smallest. In order to make this question 
meaningful, we fix the covariance of the joint distribution of the sources. Second, we consider fixing the source distribution 
and asking what is the worst-case noise in the network. To make the latter problem well-posed, we focus on additive-noise 
networks, where the covariance matrix of the noise terms is fixed. 

In order to formally state these two problems, we will need the following notation. We refer to an ?i-tuple {A^]}"^ 1 
by both A™ and X (when the size of the tuple n is clear from the context). If a random variable A has a probability 
density function, it is denoted as fx(x), and if the conditional distribution of X given Y has a conditional probability density 
function, it is denoted as Jx\y{ x \v)- The notation [0 : k] is shorthand for the set of natural numbers {0,1,..., k}, and 
X i [0:k]={X i [0],Xi[l],...,X i [k]}. 

A (fc, A^) -memoryless network, illustrated in Fig. fTl is characterized by the conditional density fYi,...,Y N \u t ,...,u N > which 
relates the real valued network inputs (Ui, . . . , t/jv) to real valued network outputs (Yi, . . . , Ijv). The set of source nodes is 
denoted as S = {si, s-2, . . . , s^} Q [1 : A], and the set of destination nodes is denoted as T> = {c^, d 2 , • • • , dk} Q [1 : N]. The 
remaining nodes (we assume without loss of generality that the sets of source and destination nodes have empty intersection) 
are relays 1Z — {ri,r2, . . . ,rM-2k\ Q [1 : A]. Source node s m E S has access to the i.i.d. source X m [t], t = 0,1,..., 

4 In a coding scheme with bounded outputs, each component of the source reconstruction sequences produced by the destinations cannot exceed a given 
number M. 
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Fig. 1. (k, TV) -memoryless network. </»(-, ■) refers to the squared error. 



which must be communicated to the corresponding destination node d m G T>. The i.i.d. vectors (Xi[i], . . . ,Xk [t]) have a joint 
distribution with covariance matrix K. 

Definition 1. A coding scheme C with block length n G N for distributed compression of a real valued memoryless source 
(X\ , Xi , . . . , Xk) over a (k, N)-memoryless network consists of the following: 

1) Source Encoding Functions: Source node s m G S encodes the source X m as U Sm [t] = / Sm) t(X m , Yj^ 1 ), V t G [0 : n— 1), 



where f 8 



pt-l 



I, V m G [1 : k], V t G [0 : n — 1] are the source encoding functionsn 



2) Relay Encoding Functions: Relay node r p G 1Z receives the channel outputs from the network and encodes it as U r [t] 



UAVr 1 ), V t G [0 : n - 1], where f r _ 



Vp£ [1 : N — 2k], V t = [0 : n — 1], are the relay encoding 



functions. 

3) Destination Encoding Functions: Destination node d m G T> receives the channel output from the network and encodes 
it as Ud m [t] — fd m ,t{XcT )' wnere fd m .t ■ K* _1 — > K, V m G [1 : k], V t G [0 : n — 1], are the destination encoding 
functions. 

4) Destination Decoding Functions: At the end of the block of communication, each destination d m G T> constructs an 
estimate of the source as X TO = <?d m (Yd m ), where gd m '■ M. n — > R", V to G [1 : k], are the destination decoding functions. 

Definition 2. A distortion measure is a mapping i|i:KxM-> M + . 

Definition 3. A distortion tuple (D\, D2, ■ ■ ■ , D^) is said to be (^-achievable if for some block length n, there exists a coding 
scheme C, as described above, such that, 
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0(X m [t],X m [t}) 



< D m , V m G [1 : k}. 



(1) 



We focus on the quadratic distortion measure, i.e., where <j>(x,y) = £(x,y) = (x — y) 2 . Notice that, in this case, the 
expression in (fill can be equivalently written as 

1 



-**-rn. -^-n 



< D m , V to G [1 : k] 



Definition 4. The (^-achievable distortion region D of a (k, N) -memoryless network is the closure of the set of achievable 
distortion tuples. 



Theorem 1 (Main Result 1: Worst-Case Source for (k, N)-memoryless network). For a (k,N) memoryless network, let 
T>NQ Tce and X>Q Urce stand for the ^-achievable distortion regions for an arbitrary memoryless non-Gaussian source with 
covariance matrix K and for a memoryless Gaussian source with the same covariance matrix, respectively. Then 

Vq ^ U NG . (Z) 

5 Here and throughout we use the terms 'functions' and 'mappings' interchangeably and assume that they are measurable. 



Note 1. A special case of Theorem U\ is that of wireline networks where each link is a (noiseless) bit pipe. This gives us the 
result of Gaussian source being the worst case source for the k-encoder distributed compression problem studied in /^j. 

In order to state our second main result, we focus on the following class of networks. 

Definition 5. A (k, N)-memoryless network is said to be an additive-noise network if the input-output relationship is given by 



Y 1 

Y 2 



= H 



u 2 



Zi 

z 2 



(3) 



where H is a real-valued N x N matrix and (Zi, . . . , Zjy) is a noise vector with joint distribution fiz independent of 
(Ui, . . . , C/jv)- If (Z\, . . . , Zn) is distributed as A/"(0, K) for some covariance matrix K, then we call the network a (k, N)- 
additive white Gaussian noise (AWGN) network. 



Theorem 2 (Main Result 2: Worst-Case Noise for (k, N)-memoryless additive-noise network). For an arbitrary source of 
finite covariance and a (k, N) memoryless additive-noise network, let T>™Q e and 2?g olse stand for the ^-achievable distortion 
regions for an arbitrary additive-noise non-Gaussian distribution with covariance matrix K and for additive Gaussian noise 
with the same covariance matrix, respectively. Then 



L>G 5= U NG ■ 



(4) 

Note 2. In [5], Gaussian noise was also characterized as the worst-case additive noise in wireless networks. However, /[5]? 
considers a channel coding setting, and Gaussian noise is shown to minimize the capacity region, while, in this paper, we 
focus on a joint source-channel coding setting, and Theorem^establishes that Gaussian noises minimize the distortion region. 

ILL Overview of Proof Ingredients 

In this section, we give an overview of the main proof ingredients, describing at a high level how they are connected, and 
highlighting the connections between the proofs of Theorems [T] and [2] 

The overarching idea is to use a coding scheme for distributed compression designed for a Gaussian model (Gaussian source 
or Gaussian additive-noise network) to construct a new coding scheme that achieves approximately the same distortion tuple 
when the source or the additive noises are not Gaussian but have the same covariance as in the Gaussian case. 

The first main step in the construction of this new coding scheme is to utilize the DFT-based linear transformation introduced 
in (5) in order to transform blocks of i.i.d. non-Gaussian random variables into "approximately Gaussian" random variables. 
More specifically, we define the unitary 6x6 matrix Q (for simplicity we assume b to be even) by setting the entry in the 
(i + l)th row and (j + l)th column to be 

i/Vb 

Q(hj) = { (_i)i/V6 

7276sin( 2 ^ ( 7 b/2) 

for i,j € {0,...,6— 1}. Applying Q to a vector x can be intuitively seen as first taking the DFT of x, then separating the 
real and imaginary parts of the resulting vector, and renormalizing them so that the resulting transformation is unitary. It is 
readily verified that CFJis a unitary transformation, i.e., that ||Qx|| = ||x|j for any x £ M b . 

The fact that, for any random vector x with i.i.d. non-Gaussian random variables, Qx converges in distribution to a Gaussian 
random vector (as b increases) was formalized in (5), J6). For random variables X\,X2, ■■■ and X, we let X n —> X mean 
that X n converges in distribution to X as n — > oo. The following lemma was first proven in O, but we include a proof in 
Appendix |A] for completeness. 

Lemma 1 (Convergence Lemma). Suppose {(Xi[i\, . . . ,Xk[i])} i=0 is an i.i.d. sequence of length-k random vectors with 
covariance matrix K, and let Q be the unitary linear transformation in (|5| and 
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(6) 



6 Note that one can potentially come up with other choices for this transformation as well. Intuitively, Q should not put large mass on any of its components 
and distribute the mass almost uniformly. Mathematically, Q should be unitary and should satisfy the Lindenberg Condition, in the proof of Lemma 1 , so as 
to have the corresponding Central Limit type theorem (Lemma 1). Our particular choice of Q is made for concreteness, mathematical convenience, and also 
due to the practical consideration that it would have a FFT-like implementation. 



for t = 0, 1, . . . , n— 1. Then, for any sequence it such that, for b = 1, 2, ..., l^ £ {0, 1, ... ,6—1}, and any t £ {0, 1, . . . ,n— 1}, 

(Xf 6) [*],..., X£ 4) [if) 4tV(0,K), asfr^oo. 



In the proof of Theorem [T] we apply Q to blocks of b source symbols in order to create an effective source which is 
approximately Gaussian. Similarly, in the proof of Theorem pj we apply Q _1 to blocks of b transmit signals, and Q to blocks 
of b received signals, in order to make the effective additive noises approximately Gaussian. Since the application of the linear 
tranformation Q (and Q _1 ) results in statistical dependencies between the resulting sources or noises, a simple interleaving 
scheme is employed in order to create i.i.d. approximately Gaussian sources or noises. We then apply a coding scheme designed 
to achieve a given distortion tuple {D\, . . . , Dk) under Gaussian assumptions to these resulting i.i.d. blocks. 

The main technical challenge in the proofs of Theorems [T] and [2] is to show that, as b — >• oo, the resulting distortion converges 
to (D\, . . . , Dk). In order to do this, we establish several technical lemmas, which allow us to assume without loss of generality 
that our original coding scheme designed for a Gaussian model satisfies certain properties. 

First, we need a lemma that allows us to restrict attention to bounded output coding schemes; i.e., coding schemes in which 
the output of the decoding functions is bounded. Thus we need to show that any achievable distortion tuple can be attained 
arbitrarily closely by a bounded output scheme. The advantage of dealing with codings schemes with bounded output is that it 
becomes easier to apply standard results such as the Dominated Convergence Theorem to the associated sequence of distortions. 

Lemma 2 (Bounded Output Lemma). Suppose (Xi [t], . . . , Xk [t] ) has an arbitrary joint distribution with covariance matrix 
K and a coding scheme C with blocklength n achieves distortion vector (D%, . . . ,-Dfc). Then, for any e > 0, one can build 
another coding scheme C of block length n with decoding functions (jd m with the property that 

\\~9d J {yi-,---,yn)\\ 00 < M, 

for any (j/i, . . . , y n ) £ R™, j = 1, . . . , k and a fixed M > 0, which achieves distortion vector (D\ + e, . . . , Dk + e). 

Another important property that we need to assume for the original coding scheme designed for a Gaussian model is that 
of finite reading precision, which was introduced in 0. In coding schemes with finite precision, the encoding and decoding 
operations of the nodes in the network may only take finite precision inputs; i.e., inputs with a finite number of decimal places. 
More formally, for a real-valued vector x n = {x\, . . . ,x n ) and a positive integer p, we let \x n \ = 2~ p ([2 p xiJ, . . . , [2 p x n J), 
and define the following. 

Definition 6. A coding scheme C of block length n is said to have finite reading precision p = [pi, ... , p^} G N if the 
encoding function at each source s m £ S satisfiess 

and the encoding functions at each node i £ 1ZWD satisfies 

fi,t(y i - 1 ) = f i ,t([v t - 1 \ Pi l 

for any xj^ £ M. n , any y l ~ l £ K t_1 , and any time t. 

While finite reading precision is useful in the proof of Theorem [2] to prove Theorem [T] we instead require that the source 
nodes only have access to a finite number of decimal places of the source symbols. We call this finite encoding precision. 

Definition 7. A coding scheme C with block length n is said to have finite encoding precision p = [pi, . . . , pi] £ N k if the 
encoding function at each source s m £ S satisfies 

/w«>l/' _1 ) = /w(|a£J„ m 'y'" 1 ), V m £ [1 : k] 

for any x^ £ M. n , any y' _1 £ M i_1 , and any time t. 

In order to prove the optimality of coding schemes with finite reading/encoding precision (i.e., that they can come arbitrarily 
close to achieving any point in the achievable distortion region), our main tool is the following result. 

Lemma 3. Suppose Y = (Yi, . . . , Yi, . . . , Yfc) is a random vector with density ,/V 1) ...,Y i) ...,Y fc . Consider some p £ N. For some 
i £ [1 : k], let Y { = \Yi\ p + U p , where U p is uniformly distributed in (— 2~ p_1 , 2~ p_1 ) and is independent o/Y. Then 

lim / y (P) Y (yi,...,yi,...,yk) =fY 1 ,...,Y i ,...,Y k (yi,---,yi,---,Vk), Vie [l :k], (7) 

for almost every (j/i, . . . , y i} . . . , y k ) £ R k . 

This lemma allows us to take a coding scheme that does not have finite precision, and consider finer and finer discretizations 
of its encoding functions, in a way that the resulting distortion tuple approaches that of the original coding scheme. 



Another technical tool that is useful in proving the optimality of coding schemes with finite encoding precision is the following 
lemma. Intuitively, it allows us to view our stochastic network (as defined in Section[ll| as a collection of deterministic networks, 
which facilitates the bounding of the resulting distortion. 

Lemma 4 (Functional Representation Lemma). For any two random vectors Y and U, there exist a (deterministic, measurable) 
function h and a random vector Q, independent of U, for which the pair (h(U, Q), U) has the same distribution as (Y, U). 

Lemmas [2] [3] and HI allow us to prove the essential optimality of finite precision coding schemes, stated in the next two 
lemmas, whose proofs are presented in Appendices IE] and IF] 

Lemma 5 (Finite Encoding Precision Lemma). Suppose the distortion tuple {D\, . . . , Dk) is achievable over the (k,N)- 
memoryless network. Then, for any e > 0, there exists a coding scheme with finite encoding precision that achieves distortion 
tuple (D x + e, . . . , D k + e). 

Lemma 6 (Finite Reading Precision Lemma). Suppose the distortion tuple (Di, . . . ,Dk) is achievable over the (k,N)- 
AWGN network. Then, for any e > 0, there exists a coding scheme with finite reading precision that achieves distortion tuple 

Remark. We point out that, from the proofs of Lemmas [2] [5] and [6] it can be seen that there exists a single coding scheme 
that has both bounded outputs and finite encoding/reading precision and achieves distortion tuple (Z?i + e, . . . , Dk + e). 

In order to state the next result, we use the following definition. 
Definition 8. A function f : R a — > M b is locally constant at a point x £ M. a if it is constant in some neighborhood of x. 

The importance of finite encoding/reading precision is expressed in the following lemma. 
Lemma 7 (Continuity Lemma). If a function f : K a — > M. b satisfies 

/(x) = /(LxJ p ) 

for some p £ N and any x £ K a , then f is locally constant (and, thus, continuous) almost everywhere. 

Therefore, coding schemes with finite precision have encoding functions that are continuous almost everywhere. As a result, 
we may start off the proofs of Theorems [TJ and [2] with a coding scheme (designed for a Gaussian model) that has bounded 
outputs and finite encoding precision (in the case of Theorem fT) or finite reading precision (in the case of Theorem E). 
The continuity of these functions allows us to bound the distortion achieved by the coding scheme constructed through the 
application of the linear transformation Q and the interleaving scheme, and show that it converges to the distortion of the 
original coding scheme as b — > oo, by invoking the results pertaining to weak convergence to the Gaussian distribution in the 
transform domain. 

IV. Worst Case Source for a given Network 

In this section, we prove Theorem [TJ that is, that among all the sources with a given covariance, the Gaussian source is least 
compressible over a fixed memoryless network. Note that proving Theorem IT] is equivalent to proving the following theorem. 

Theorem 3 (Equivalent to Theorem [T). If a distortion tuple (D\, . . . , Dk) is ^-achievable when (Xi, . . . ,Xk) is jointly 
Gaussian with covariance matrix K, then for any e > 0, the distortion tuple (Di + e, . . . , Dk + e) is ^-achievable when 
(Xi, . . . ,Xk) has an arbitrary distribution with covariance matrix K. 

Before delving into the mathematical details of the proof below, we give an overview of the proof steps. A high level 



illustration is provided in Fig. 2(a) The overall idea is to use the achievable scheme for distributed compression of the 
Gaussian source over the network, and devise a new scheme for the arbitrary source with the same covariance to show that 
the achievable distortion in both cases is similar. 

We first apply the linear transformation Q to blocks of the sources, and then the transformed source symbols from various 
blocks are interleaved to create new blocks of independent "effective" source symbols. The idea is that each of these new 
effective source symbols is now a weighted aggregate of many source symbols, and using a central limit theorem-like result 
(Lemma [TJ, the effective source is close to the Gaussian source with the same covariance. We next invoke the Functional 
Representation Lemma (Lemma HI to "transform" the given stochastic network into a randomly chosen deterministic network; 
that is to say that the output to a node is a deterministic function of the inputs to the network and some external randomness, 
independent of the inputs. We then take the achievable scheme for the Gaussian source, and construct an equivalent scheme 
with bounded output (using Lemma |2]i and finite encoding precision (using Lemma |5J, and apply it to the new effective network 
with approximately Gaussian sources. Using the continuity property of finite encoding precision schemes (Lemma 17), and the 
property of bounded output, we conclude the proof by showing that the distortion achieved on this effective network with 
approximately Gaussian sources is close to what it would have been if the sources were actually Gaussian. 
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(a) Overview of the proof of Theorem [T] 
Fig. 2. Flow Diagrams showing the overview of proofs of Theorems [T] and [2] 



(b) Overview of the proof of Theorem U\ 



Proof of Theorem u\ Suppose the distortion tuple (_D 1; . . . , DjS) is achievable in the case where (XJO], . . . , Xfe[0]) is 
jointly Gaussian with covariance matrix K. Fix e > 0. From Lemmas [2] and B] (and the subsequent remark), we can assume that 
we have a code C with blocklength n as defined in Definition fl] which achieves distortion vector {D\ + e/2, . . . , D^ + e/2) 
if (Xi[0], . . . ,Xfc[0]) is jointly Gaussian, with finite encoding precision p = [pi,.. ■,Pk] G N an d f° r which 



\gdjiy\T ■ ■ ,y n )\ 



< M, 



for any (yi,...,y n ) € R™, j = 1, . . . , k and a fixed M > 0. 

We will build a coding scheme C with block length nb, for a large integer b, with source encoding functions f Sfn t, relay 
encoding functions f r t , destination encoding functions fd m ,t an d destination decoding functions gd m ,t- Since we will be 
working with a block length nb, we will let X m = (X m [0], . . . , X m [nb — 1]), for m = 1, . . . , k. All relay encoding functions 
and destination encoding functions will be constructed by simply repeating f r t and fd m ,u b times. More precisely, for a time 
t = in + r, for £ e {0, . . . , b - 1} and r € {0, . . . , n - 1}, we let 



Id^tiv'- 1 ) 



fr P Ay[M,y[^ + i},...,y[tn + r-i}), 
--fd m AvM,y[^ + i],...,y[en + T-i} 



Thus, from the point of view of the relays and destination encoding functions, we are simply repeating coding scheme C, b 
independent times. 

As in the construction of the relay and destination encoding functions, we will essentially repeat f Sm b times. However, 
instead of applying each of these source encoding functions considering the random source X m , each source s m will instead 
apply it to an effective random source X m which can be obtained from X m through an invertible transformation. This invertible 
transformation is depicted in Fig. [3] First, X„, is broken into n blocks of length b. To each of these blocks we apply the unitary 



linear transformation Q. The n resulting blocks of length b are then interleaved, generating b length-n vectors X 



(d) 



as shown in Figure pi The new effective source is given by X m = (X 



(0) 



,(6-1) 



,x. 



(6-1) 



) . Therefore, for a time t = £n + r, for 



£e {0, ...,&- 1} and re {0, . 



1}, the source encoding function can be described as 



; Sm ,t(X m , y'" 1 ) = / 8m , r (XMy[fe], y[ln + 1], . . . , y[£n + r - 1]). 
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Fig. 3. Illustration of the new encoding procedure for source node s m . 



At the decoders side, we first apply the destination decoding functions g dm to each block of length n, and then we invert the 
transformation applied by source s m to X m . More precisely, g dm is obtained by taking the length-nb vector 

(9d m (Y dm [0], . . .,Y dm [n - l]),g dm {Y dm [n], . . . ,Y dm [2n - 1]), . . ■ , g dm (Y d J(b - l)n], . . .,Y d Jbn - 1])) , 

interleaving the b blocks of length n to obtain n blocks of length b, and then applying Q _1 to each of these blocks. 

Our next goal is to show that, by choosing b large enough, we can make the distortion of this new code arbitrarily close 
to the distortion of the original code applied to the Gaussian source. We will employ the Functional Representation Lemma 
(Lemma HI) in order to alternatively view our stochastic network as an ensemble of deterministic networks. Note that the 
achievable distortion of the network under consideration is a function of the joint distribution of sources, channel inputs, channel 
outputs, i.e., (Xi, . . . , Xk,Yi, . . . , Yjv, U\, . . . , Un). Hence, from Lemma El the memoryless network Jy 1 ,...,y n \u 1 ,....u n can 
be equivalently represented as a deterministic network Yj = hi(Ui, . . . , Un, Z), V i € [1 : N] where Z is a random vector 
independent of the channel inputs (f i, . . . , Un)- Thus, given a length-n6 vector of realizations of Z, z = (zq, . . . , Z n b-i)> given 
our coding scheme C, all the received signals in the network are a deterministic function of the random sources Xi, X 2 , . . . , Xj.. 
Thus for a block length nb, we can write, for some functions Fi, Yj = ^(Xi, X 2 , . . . , Xfe, Z), V i € [1 : N], Therefore, 
once we condition onZ = z for some realization z of Z, the distortion of the reconstruction of destination d m , 



1 
nb 



g dm (F dm (X 1 ,X 2 ,...,X k ,z))\\ 



is only a function of the random sources Xi, X 2 , . . . , Xfe. 

Next, we notice that, since Q is a unitary linear transformation, the distortion of C can be written in terms of Xi, for 
I = 0, . . . , b - 1 as 



1 b ~ 1 1 

-T- 

b ^-^ n 

£=0 



W 



9d r , 



(F dm (X.[ 



W 



-w 



where F dm are defined again through the Functional Representation Lemma, by noticing that, during times t — in, £n + 

and not the entire sources Xi, X 2 , . . . , Xfe, 



1, ...,(£ + \)n — 1, the received signals at d m only depend on (X^ , . . 
For each b = 1,2, ..., we will let 



K*)i 



4 = are 



max E 

0<£<fc-l 



x (f) 



9d r , 



( 



F d 



M\ 



,xf,z) 



i.e., the £;,th length-n block has the largest expected distortion. Note that < (X{ h [i], ... ,Xj, [i] j > is an i.i.d. sequence 
of length-fc random vectors. From Lemma [T] we see that it converges in distribution to a sequence of i.i.d. jointly Gaussian 
random vectors with co variance matrix K, as b — > oo. 

Now, from Lemma 17] each of the source encoding functions f Sm ,t of the original coding scheme C are constant almost 
everywhere, since they have finite encoding precision. Since each function Ft can only depend on the effective sources 



~{tb) 



Xj; bl through the source encoding functions f Smt t, it is not difficult to see that, for a fixed z, Fi 



is an almost-everywhere-constant function of X.\ 



(h) 



(x.[' b 



,X 



L J r 



!->■ 



x^» 



tt ■ Therefore, the mapping 
9d jF dm (^\...,X^,z) 



for m = 1, . . . , k, is continuous almost everywhere. We conclude that 



±^- 9dm (F dm (X^\...,X^,z))\\ 4|x^- 5dm (F dm (Xf,...,X«,z)) 

n-l 



as b — >• co, where X^ = (X^[0], . . . ,X^[n— 1]), for m = 1, . . . , k, and {(-Xi [i], . . . ,X£ [i])} . is an i.i.d. sequence such 
that (Xp[0], . . . , X^ [0]) is jointly Gaussian with zero mean and covariance matrix K. Moreover, we have that 



X^-g dm F dm (xf"\...,X^,z) 



< 2 

< 2 



X (t b ) 



+ 2 



g dm [F dm (x[ e »\...,xl eb \z) 



and also that 



x (4 



'b-i 
nE[J2Xm\j]Q(tb,j) 

d=0 



2nM l 



6-1 



2K m!m ^Q 2 (4, j) = nK mjra < co. 

3=0 



Thus, from a variation of the Dominated Convergence Theorem (see Appendix IHl), we conclude that, as b — >• co, 



9 J4(xf ) X^.Z) 



z = 



for all z. This means that the random variable E 

2 



-A-m — g d „ 



X m "" ffrf m ( F d m (Xj , . . . , X fe , Z) 



(f (im (xf l »... ) x[ 4 )z) 



z = 



(8) 



(9) 



(10) 



X£_ 5rfm (F dm (Xf,...,X£,Z) 



independent of Z, 



Moreover, for any b, by following the steps in ( 8 » and (9i, since X„j is 



converges surely to 



'(h) 



x» ) -fc(^(x? ,) ,..,f ) ,z) 



z = 



<2 E 



= 2 E 



X (M 



x. 



z = 



2nM 2 



2nAf 



< 2n (K m , m + M 2 ) . 



Thus, a second application of the Dominated Convergence Theorem yields 



E 

-> E 



xw- fc ^(^.....f ,z; 



X m _ ffd m ( ^dm ( X l ) • • • ) X fe , Z) 



which implies 



x(^) 



b) 



fc(^(xi li ',..,xr,z) 

Therefore, we can choose b sufficiently large so that 

X<f< 



X m _ 3d m (Fdm ( X l ) • • • i Xfc , Z) 



<n(D m + e/2). 



l -E 

n 



<1e 

< D m 



g dm (F dm (X^\...,X^\Z) 

X m - 9d m \F dm (X 1 



, . . . ,X k ,Z] 



e/2 



The expected distortion of code C (with blocklength nb) thus satisfies 

6-1 



x w _ 

< An + e, 
for m = 1, . . . , k. This concludes the proof of Theorem [T] 
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V. Worst Case Additive Noise for a given Source 

In this section, we prove Theorem 12} that is, that given any arbitrary source to be encoded over an additive-noise network, 
amongst all the noise distributions with a given covariance, Gaussian noise leads to the worst distortion. Note that proving 
Theorem [2] is equivalent to proving the following theorem. 

Theorem 4 (Equivalent to Theorem pb. If a distortion tuple (D\, . . . , Dk) is ^-achievable on an additive-noise memoryless 
network when (Z\, . . . , Zn) is jointly Gaussian with covariance matrix K, then for any e > 0, the distortion tuple (D\ + 
e,..., Dk + e) is ^-achievable when (Z\, . . . , Zn) has an arbitrary distribution with covariance matrix K. 



As we did in the previous section, we first give an overview of the proof. A high level illustration is provided in Fig. 2(b) 
The overall idea is to use the achievable scheme for the distributed compression of the source over the additive Gaussian noise 
network and devise a new scheme when the noise is arbitrarily distributed with the same covariance so that the achievable 
distortion in both cases is comparable. 

We first apply the linear transformation Q to the blocks of the network inputs and outputs to create effective inputs and 
outputs. Our goal is to convert the additive-noise network into an approximately additive Gaussian noise network. The key 
idea is that the new effective channel noise is now a weighted aggregate of many independent noise components, and using a 
central limit theorem like result (Lemma [TV the effective noise is close to Gaussian with the same covariance. We then take 
the achievable scheme for the Gaussian noise, and construct an equivalent scheme with bounded output (using Lemma [2) and 
finite reading precision (using Lemma [6]), and apply it to the new effective network with approximately Gaussian noise. Using 
the continuity property of finite reading precision schemes (Lemma 17), and the property of bounded output, we conclude the 
proof by showing that the distortion achieved on this effective network with approximately Gaussian noise is close to what it 
would have been if the sources were actually Gaussian. 

Proof of Theorem K We start by noticing that, if the distortion tuple (D±, . . . , Dk) is achievable in the case where 
[Z\ [0], . . . , Zjv[0]) is jointly Gaussian with covariance matrix K, then Lemmas [2] and [6] imply that we have a code C with 
block length n with finite reading precision p, which achieves distortion vector {D\ + e/2, . . . , Dk + e/2) and for which 



\gd 3 (yi,- 



, y„ 



< M, 



for any (t/i, . . . , y„) £ E", j = 1, . . . , k and a fixed M > 0. 

In order to build our new coding scheme C, we will first define operations that will be applied to transmit and received 
signals in the network. For an integer b, these operations will effectively create b new memoryless networks, which will replace 
the input-output relationship in ((3) with 



(11) 



y(<) 
J N . 


= H 


op 


+ 


\z[ e) ] 

zf 



for i — 0,1, . . . ,b— 1, where U\_ ,■■■ , U N are the effective inputs, Y^ , . . . , Y^ the effective outputs, and (Z\ ,... , Z N ) 



the effective additive-noise vector of the tth effective network. In each of these memoryless networks we will then apply our 
original coding scheme C. The resulting coding scheme C will thus have block length nb. 

The operations applied to transmit signals and received signals will make use of the linear transformation Q, defined in |5). 
Given a block of nb effective inputs to a node i € V U i [0], . . . , U> '[n— 1], for £ = 0, 1, . . . , b — 1, the actual nb transmit 
signals of node i are built as 



Ui[tb] 
Ui[tb + 1] 

Ui[tb + b-l) 



= Q 






u 



(6-1) 



(12) 



for t = 0,1, . . . ,n — 1. The effective network outputs are built from the actual received signals by setting 



n (1) M 



y(b-l) 



[*] 



Q 



Yi[tb] 
Yi[tb + 1] 

Yi[tb + b- 1} 



(13) 



II 



for t = 0, 1, 



— 1. Using ( p"3] >, ([3]) and then (|T2j we can write 



y«[t] 



y 



(6-1) 



M 



Q 



y 2 (0) M 
y«[t] 



C7i[t6] 
t7i[*6 + l] 





= Q 


Yt l) [t}_ 





Fi[*6] 

Fi[t6+1] 



Y 2 [tb] 

Y 2 [tb+1] 



U N [tb] 
U N [tb+l] 



Ui[tb- 

mt] 

ui 1] [t] 



ui b -V[t] 



U N [tb + b-l] 



Y^tb + b-l] F 2 [*6 + 6-l] 

Zi[tb] 
Zi[tb+1] 

Z x [tb + b-1] ■ 



U 






tft-M 



H 1 



Q 



H J 



Zx[t 

Zx[tb- 



Q 



Y N [tb] 
Y N [tb+l] 

■ Y N [tb + b-l 

Z N [tb] 
Z N [tb+l] 

Z N [tb + b-l] 



(14) 



(15) 



Zx [tb + b- 1] 



Z N [tb] 
Z N [tb + l] 

Z N [tb + b-l] 



(16) 



This shows that we in fact have the effective input-output relationship shown in ( 1 1 1, if we define the effective noise vectors 
via 



? (0) 

HI 



zr\t} 



Z { x'[t] 



f- 1 ' 



z\ u - X) [t] 



4 0) M " 


= Q 


zVM 





Zx[tb] 
Zx[tb+1] 

Zx[tb + b- 1] 



Z N [tb] 

Z N [tb+l] 

Z N [tb + b-l] 



(17) 



In order to apply our original coding scheme C to the Ah effective network, we set the effective transmit signals to be 

U { s e 2 [t] = f Sm ,t (X m [£n:(£ + l)n-l], Y& [0 : t - 1] ) 
for each source s m £ S, and 



rW 



Mr 



(18) 



(19) 



ur\t\ = kt(yr\o:t-i]) 

for any other node i £ 1Z U T>. The decoding functions are applied at each destination d m € T> as 

X m [ln : {£ + l)n - 1] = g dm (Y^[0 : n - 1]), 
and each destination can output the estimate X E IR™ 6 by concatenating the length-n output from each of the b effective 



(20) 



networks. Next, we check that equations ( [T8| , ( 19 1 and p0| ) do not violate causality when they are mapped to the actual transmit 
and received signals according to (fT2l) and dl 3b. From (1121, we notice that, at time tb, for t = 0, . . . , n— 1, in order to construct 
the next b transmit signals (Ui[tb\, ..., Ui[tb + b — l]), node z needs the effective transmit signals (£/^ [t],...,U^ ~ [t]). These 
effective transmit signals, in turn, can be constructed given the effective received signals Y^ '[0], . . . , Y^ \t— 1], £ = 0, ... ,6—1, 



from (TrBjl and ( 19 1. Finally, from ( 13 1, we notice that all these effective received signals can be constructed at the end of time 



slot (£— 1)6 + 6— 1 = tb— 1, and will therefore be available at node i at time tb. It can be similarly checked that the decoding 



operation defined in (20 1 does not violate causality. 

We now analyze the effective noise vectors obtained. The fact that the additive-noise vectors 



HMr 



vM 



Z?>[t],Z?>[t],...,Z%>[t] 



Mr 



(21) 



for £ = 0, . . . , n — 1 are i.i.d. follows easily from the fact that they correspond to a row of the matrix on the left-hand side of 
( fT7] i, which is i.i.d. for t = 0, . . . , n — 1, since the matrix on the right-hand side is i.i.d. by the definition of the memoryless 
additive-noise network. Moreover, by comparing ( fTT) with (|6j, we see that Lemma [T] implies that 



K4) 



(40 r 



Zi*» ; [t],^[t],...,^ 4 



t] AAA(0,K) 



(22) 



as 6 — >• oo, for each t G {0, . . . , n — 1}, and any sequence ^, 6=1,2, ..., such that ^ e {0, . . . , 6 — 1}. 

From Lemma |6j we were able to assume that the original coding scheme C has finite reading precision, which implies that 
the encoding, relaying and decoding functions f Sm ,t> fr ,t, fd m ,t an d fd m .t are continuous almost everywhere. It is then not 
difficult to see that, for each destination d m , we can write 



(y-if[0:n-l])=F m (xf,...,xf,^ 



M 



.,Z« 



iV 



12 



7W 



fit) yW 



where F m is an almost-everywhere-continuous function of Z 
for £ = 0, . . . , b — 1. Therefore, the mapping 

{zW...,zW}^|xW-, dm (F m (xf,...,xWzf\...,z 



7W 



^)r 



i - v , Xf ; = X,[fe : l(n+l)-l] and Z,^ = Z? ' [0 : ra-1], 



7 W 



for m = 1, . . . , k, is continuous almost everywhere as well. We conclude that 



X (4) 
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M x i J 



-(4) 
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xW- 9dro (F m (xW..., 
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'(0) 7 (4) 



7 (4) 
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7(4) 



( -Fm f X x , 



v(0) 7 



"-fc ) ' 



as & -> 00, where Zj = Zi[0 :n— 1] for i = 1, . . . , TV, and i f Zi[t], . . 
vectors 7V(0, K). Moreover, we have that 
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/ J t=o 



is an i.i.d. sequence of jointly Gaussian 
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v(0) 7 



JJV 



(23) 



Thus, from a variant of the Dominated Convergence Theorem (see Appendix HI, we conclude that, as b 
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Therefore, we can choose 6 sufficiently large so that 
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(25) 



The expected distortion of code C (with blocklength nb) thus satisfies 
6-1 



^£E|xW-^(F m ( X W. 






7(f) 



< — max E 

n o<e<b-i 

< D m + e, 



xw-a,(^(xf,..,xf,z«....z 



7.W 
N 



for m = 1 , . . . , k, since ( 24 1 holds in particular for the sequence 



arg max E 
o<e<b-i 



X M 



9d„ 



(F m (x\ 



M 



v(<) 7W 



7W 



for 6 = 1,2, This concludes the proof of Theorem [2] ■ 

Note 3. Note that the setup in this section (Theorem^ or equivalently Theorem^ assumes quadratic distortion as the criterion 
of distortion between the source and its reconstruction. Indeed, the arguments in the proofs of some of the supporting lemmas 
above do depend on the nature of the distortion metric. However, the overall idea of transforming the source or the channel 
into approximately Gaussian, and then constructing encoding-decoding schemes with finite encoding or reading precision is 
independent of the nature of the distortion metric. It can be shown that, under mild conditions, our results carry over to other 
distortion metrics. This in general is not true for the setup in Theorem U\ (or equivalently Theorem pj, where the proof hinges, 
among other things, on the fact that the distortion in the transform and the original domain is the same. 
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VI. Conclusion 

We considered the problem of distributed compression of correlated sources over a network, for which we established two 
complementary worst-case results. The first one is that, under a source covariance constraint, the worst-case source is Gaussian. 
The second is that, for additive-noise networks where the noises satisfy a covariance constraint, the worst-case noise is also 
Gaussian. These results provide a theoretical justification for the common adoption of Gaussian models for both the source 
and the additive noise in distributed compression problems. 

Our approach to establish these results is constructive, in that we describe a systematic way of converting coding schemes 
designed under Gaussian assumptions into coding schemes that can handle non-Gaussian assumptions. The idea behind the 
construction of such schemes is simple both conceptually and algorithmically, as the DFT transform can be implemented via 
the tractable FFT, and the remaining part is to employ a good scheme for the multi-terminal Gaussian source or channel, for 
which there is a well-developed and growing body of literature. 

Another interesting aspect of our framework for converting coding schemes designed under Gaussian assumptions into 
coding schemes for non-Gaussian models is that it only requires the mean and the covariance matrix of the sources or the 
additive noises. Thus, this code conversion scheme can be seen as a way to design coding schemes for distributed compression 
problems where only the mean and the covariance matrix of the sources or the additive noises are known. Therefore, this work 
may provide tools for future research in establishing inner bounds in the distortion region of distributed compression problems 
with unknown source or noise distributions. 

Another possible research direction stemming from this work concerns finding outer bounds to the distortion region of 
Gaussian problems. Notice that, even for the Gaussian fc-source encoding problem, the rate-distortion region is unknown 
for k > 2, and finding nontrivial outer bounds is in general difficult. In this work, we showed that, given the appropriate 
covariance constraints, Gaussian sources and Gaussian additive noises are worst-case assumptions. This means that the distortion 
region under non-Gaussian assumptions contains the Gaussian distortion region. Thus, by choosing special source and noise 
distributions (e.g., discrete distributions), it may be possible to obtain distributed compression problems where interesting outer 
bounds can be derived. Our results would then imply that any outer bound derived in this manner is also an outer bound on 
the distortion region of the corresponding Gaussian problem. 
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Appendix A 
Proof of LemmaQ] 

Clearly, it suffices to show that (X[ b '[0],...,Xj t . [0] ) converges in distribution to a jointly Gaussian random vector with 

covariance matrix K, as b — >• oo. In order to use the Cramer- Wold Theorem [7|, we fix an arbitrary vector (ti, ...,£&) G M fe 
and we notice that 

k k 6-1 

m—l m—1 j—0 

6-1 / k \ 

= J2 E*™^ jQVbJ)- (26) 

j=0 \ro=l / 

To characterize the convergence in distribution of (|26b, we will need the following result. 

Theorem 5 (Lindeberg's Central Limit Theorem Q). Suppose that for each b = 1,2, ..., the random variables Y^i, Yb.2-, ■■■■, ^6,6 
are independent. In addition, suppose that, for all b and i < b, E[Yf, t i] = 0, and let 

b 

s 2 b =Y J E[Y b 2 l ]. (27) 
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Then, if for all e > 0, Lindeberg's condition 

b 



^J^E (V* 1 {|Y M | > es h }) -> 05 6 -> oo 



(28) 



holds, we have that 



12i=l Y b,i d 



Sb 



A/-(0,1). 



To apply Lindeberg's CLT, we will let, for j = 0, ..., b — 1, 



Y bJ+1 = Vb[J2 t m x m [j] Q(e b ,j). 



Then, if we let K. u , v be the entry in the uth row and vth column of K, we have 

b b / k \ 

s l = E E [Ki\ = 6 E Q 2 ( e b'j - l ) E E f - x -t? - ^ 



j=i 



3=1 



= b J2 t u t v K UiV j2Q 2 (£b,j-i) 

l<u,v<k J — 1 

l<«,t»<fc 

regardless of the value of £ b . In order to verify Lindeberg's condition, we define a 1 = J2i<u v<k tutv^-u,v an d we let 
Ub,j = F b j 1 {lYbjl > es;,} = Y b 2 - 1 < \Y b j\ > eaVb >. Consider any sequence j b , for b = 1,2, ..., such that jf, e {1, ..., b}, 
and any <5 > 0. Then we have that 



Pr (U b , jb <S)> Pr (\Y bjjb \ < saVb) > Pr 



2 y t m x m \jt, — i] 



^ <ea\fb 



= Pr 



/ y ^m^m[0] 



m— 1 



< eayfb/2 -> 1, 



as 6 — > oo, which means that [7/,^ — > (i.e., /7&j t converges in probability to 0) as 6 — )■ oo. Moreover, we have that, 



i^,j<n 2 Jb <2 E^* m bw] 



for b — 1,2, ..., and 



2 ^t m X m [0] 



= 2er < oo. 



Thus by the Dominated Convergence Theorem (see Appendix [M|, we have that E[Ub.j b ] — >• as & -> oo. We conclude that 

i E E Oft i (M > es h }) = ± E s m 



3 b *= 



8=1 



CT 2 6 

1 



3=1 



< — ^ max E \Ub 7 1 ->• 0, 

c 2 1<J<& 

as 6 — > oo, and Lindeberg's condition (28 1 is satisfied for any e > 0. Hence, from Theorem B] we have that 



2~ii=l Yb,j d 

aVb 



AA(0,1), 



which implies, from ( 26 1, that 



k 6-1 / fc \ 

E i m ^ b) [o] = E E tmX m \j] Q(4,i) 

m= 1 ^—0 \m— 1 / 

= S=^4AA(0,a 2 ). 
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Finally, since for a jointly Gaussian vector (-Xf 7 , ...,X^) with mean zero and covariance matrix K, we have $D m=1 t m X^ 



A/"(0, a 2 ), we conclude, from the Cramer- Wold Theorem that (X^ b ''[0], ...,X^ b> [0] ) converges in distribution to a jointly 
Gaussian random vector with zero mean and covariance matrix K, as b — > oo. 



'-(4)r 



K4)r 



Appendix B 
Proof of Lemma[2] 

From a coding scheme C with blocklength n achieving distortion vector (Di, ..., Dk), we will create a sequence of coding 
schemes C^ m \ m = 1,2, ..., obtained by clipping the output of the decoding functions g d ., j = 1, ..., fc. More precisely, coding 
scheme C' m ' has the same encoding and relaying functions as C, and decoding functions g d m ' whose ith component is defined 

as 

{m, if 3d 3 (yi,...,y„)[z] > m 

-m, if gdj(yi,-,y n )[i] < -m 

9d j (yi,-;y n )[i\, otherwise 

for j = 1, ..., k, and i = 0, ...,n — 1. Now, consider a fixed j e {1, ..., fc}, and define, for i — 0, ..., n — 1, the event B; as 

B, = {X,[»] > m,g d] (Y%) [i\ > m) U {Xj[{[ < -m,g dj (yj;) [t] < -m} . 
It is easy to verify that the complementary event is given by 

B\ = {\XM < m} U {| 5dj (Vjj) [»]| < m} U {a>[»] > m,. 9dj (yjj) [t] < -m} U {^[i] < -m,<^ (Vj*) [t] > m} . 
For each of the four sub-events in Bf , it is clear that 

Xift-g* fa) [i\\ > \Xj[i\ -g d f (V|) [i] . 

Thus, we can upper bound the expected distortion of the output of decoder j of C^ m ' as 



^-^(Olf = E E (xM-9 d f(Yz)[ t 

n — 1 s r 

= E{ E faw-^te) 

n— 1 s r 

<E| E (^w-^(^)w) 



i]\ 1 B . 



< E 



*?-M y J 



^M-^ m) (i?)M) i 

E E (^w-^ } (^)M) a i* 

n-1 

E E [(*. 



H) 2 1b, 



= nD i +nE[(X i [0]) 2 l flo 

Since |Xj[0] 2 l Bo | < Xj[0} 2 , E [Xj[0] 2 ] < oo, and Xj[0] 2 l So 4 as m ^ oo, by the Dominated Convergence Theorem 
(see Appendix |H), 



lim E 

m— >oo 



(^■[0]) 2 1b o 



0. 



Therefore, for any e > 0, we can pick m — M large enough so that 



X 



n n (M) ( V n\ 

j 9 dj { r d 3 ) 



< Dj + e and 



{M)( N 



< M, 



for all j = 1, ..., if, and we may let C = C (JU) . 
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Appendix C 
Proof of Lemma[3] 

For the sake of simplicity, we will consider the case k — 2 and i — 2 (i.e, Y\ is quantized to Yl). The proof for k > 2 
follows via a straightforward generalization. The proof follows similar lines of thought as Lemma 3 in 0, we state here the 
required steps for completeness. The density f Y Y (2/1,2/2) can be written for almost all tuples (2/1,2/2) as, 

/y^ (j/l, 2/2) = 2 P E[l { j /1 _ L y 1 j pe( _ 2 -P-i,2-P-i)}l>2 =2/2] Iy 2 {V2) 

= 2"Pr[j/i - L^ijp G (-2-"- 1 ,2-"- 1 )|y 2 = 2/2] frM 

= 2"Pr[LFiJ P e (2/1 - 2-"- 1 ,2/i + 2-"- 1 )|r 2 = 2/2] frM 

= 2"Pr[L2"y 1 j £ ( Vl 2> - \,Vi2 p + ~)|Y 2 = j/ 2 ] /r a (te) 

= 2"Pr[2"Yi e ([2/12" - ^1,^2" + ±\)\Y 2 = 2/2] /r a (») 

= 2"Pr[Yi G (2-Tvi2" - |l,2-^Tyi2^ + ^1)|Y 2 = 2/2] /r a (|fi>) 

- 2" [ ' fr u Y 3 (xi,V2)dxi, (29) 

where a p = 2~ p [j/i2 p — |] and 6 p = 2~ p |~2/i2 p + §], such that b p = a p + 2~ p which implies, a p —> j/i. What is left to prove 
is that 

lim 2 P / fr u Y 2 (xi,y 2 )dx-i = /^.^ (2/1, 2/2) 

for almost all tuples (2/1 ,2/2)- But this follows using the proof of Lemma 3 in (5), replacing the integrand function appropriately. 

Appendix D 
Proof of LemmaO 

We prove the lemma by induction on the size t of the random vector Y. If Y is a scalar, i.e., t = 1, let <?u(2/) = -FV|[/(y| u )> 
where Fy\u is the conditional distribution function of Y given [/. Then we let Q be a uniform random variable on [0, 1] 
(independent of [/), and we let h(u,q) = 3« 1 ( ( ?) (where _1 represents the generalized inverse). It is then clear that h(u,Q) 
is distributed as Y conditioned on U = u for any u, which implies that (h(U, Q), U) is distributed as (Y, U). 

Now suppose the lemma is true when the size of Y is t. Consider a random vector Y' = (Z*, Y), where Z t has size t and 
Y is a scalar. Then there exists a random vector Q' and a function h' such that (h'(U,Q'),U) is distributed as (Z l 1 \J). Now 
let g u ,z t {y) = Py\u z*(yl M ' z ') he the conditional distribution function of Y given C/ and Z t . Then we let Q = (Q',Q"), 
where Q" is a uniform random variable on [0, 1] (independent of U and Q'), and we let h(u, (q' , q")) = jjj'L i){o")- Then 
(/i(E/, Q), C/) is distributed as (Y, [/), and (fr'(J7, Q')> M^, Q), C 7 ) is distributed as (Y' ; [/) = (Z*, Y, U). 

Appendix E 
Proof of Lemma[5] 

Achievability of the distortion tuple (Di, • • ■ , D&) implies the existence of a coding scheme C with block length n, such 
that, 



1e 

n 



X m -X m || 2 J <D mi Vm=[l:fc]. (30) 

Using Lemma [2] without loss of generality we will suppose that, 

\\g dj (yu---,yn)\\ OD < M, 

for each destination dj G T>, for a fixed Af > 0. Note that, using Lemma HI the memoryless channel fr lt — ,Yn\Ui,---,Un can 
be equivalently represented as a deterministic channel Y = hi{U\, • • • , t/jv, Z), Vi = [1 : TV] where Z is a random vector, 
independent of the channel inputs, [U\, ■ ■ ■ , Un). Thus for a fixed block length n, given the description of our encoding 
procedure, we can write, for some functions Fi depending on hi, Y; = Fj(Xi,X2,- •• ,X/-,Z), V i € [1 : N], as the 
evolution of the system depends only on the sources and the random vector Z. Thus, noting that the reconstruction for the 
77ith source is X m = gd m (Y m ), the above equation on distortion constraints can be equivalently written as, 



Ie 

n 



X m - 5dm (F m (Xi,--- ,X fc ,Z)) || 2 <D m , Vm=[l:fe], (31) 



17 

To prove this lemma we have to show that, given an e > 0, we can construct a scheme C p for some p = [pi, • • • , Pk] £ N fe , 
where the encoding function at each source s m € S satisfies 

/wOC,^ -1 ) = f Sm ,t(l< l \ Pm ,V t ' 1 ), V m€ [1 : k] 
for any x 7 T l n G M", any r/ t_1 € M t_1 , and any time t, such that, 



3d m (^m(L X lJpi) • • ' ) L X fcJp fc ,Z)) 



< Dr. 



■ e, V m = [1 : k], 



(32) 



To prove this, we consider the following randomized encoding scheme C p . Note the disclaimer that, in our definition of 
schemes, the encoding, relaying and decoding operations were defined to be deterministic, but for the time being we will 
allow for randomization and later show that it can be dispensed with. The scheme C p , operated in blocks of length n, uses the 
same relaying encoding and destination encoding and decoding functions, the only change being in the source encoding. At 
the source node s m the source is encoded as, U Bmi t — / Sm! t(X m , Y' _1 ), V t £ [1 : N], where X m = {X m [t]}™_T , such that 
X m [t] = |X m [£]J Pm + V Pm , where V Pm is a random variable independent of the sources in the network, uniformly distributed 
in {-2-P m - x ,2-P™- 1 ). Consider 



< 



1 

n 
1 



X m - 5dm (F m (X 1 ,--- ,X fe ,Z))|| 2 
1 



X, 



jL r . 



9d m (F m (X-i,-" >Xfc,Z)) 



(/) 

X„ 



(ii) 



X, 



X m — gd m {F m (Xi, ■ ■ ■ ,Xfe, Z)) 



(33) 



(in) 



Note that 



|X m [i]-X m [t]| 



v a . 



x m [t]-Lx m [t]j Pm | 

2-^(2"»X m [t]-L2^X m [t]j; 



< \V Pm \ + 2-^|2^X m [t] - [2^X m [t}\\ 

< 2~ Pm - 1 + 2~ Pm 

< 2- p ™+\ 



which implies || X m — X m ||< y / n2 1 Pm . This further implies that the term (I) of (33 i is bounded as 

1 



X, 



x r 



< 2 2 ~ 2p " 



(34) 



(35) 



implying that, in the limit, term (I) vanishes. Define the (measurable) functions H m (- • • ) : R™ x ■ • • x R" — > R, V m G [1 : k] 



1 times 



as 



H m (yi,--- ,yfc,z) =|| y m - 9d m (F m (yi,--- ,yfe,z)) II ■ 
Since Z is independent of the sources, using Lemma [3] we have the following convergence of the joint densities, 

lim /(Xi,--- ,X m , • •• ,Xfc,Z) = /(Xi,-- • ,X m ,-- • ,X fc ,Z), V m e [1 : k]. 



(36) 



(37) 



Using the above result we have that term (II) in < |33j > satisfies 

X m — gf e j m (i ?1 m ,(Xi, ■ 



1 



lim • • ■ lim — E 

px— >oo p k — >oo n 



1 



lim • • • lim — E 

Pi->oo p k — i-oo n 



(a) 



1 



lim • ■ ■ lim — E 

px— >oo Pk-i— >oo 71 

(b) 1 



,X fc ,Z)) 



E 
n 

<1e 

n 



ff(Xi,..- ,X,.,Z) 

i?(Xi,--- ,Xfe_i,Xfc,Z) 
ff(Xi,... ,X fc ,Z) 



5dm (F m (X!,--- ,X fc ,Z)) || 



<C 



(38) 
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where (a) follows from the fact that pointwise convergence of the density implies convergence in distribution of a (measurable) 
function of the random variable and this implies convergence in expectation via the Dominated Convergence Theorem (see 
Appendix |H|, as we have from the fact that g m (-) is bounded (say by M), 



1 E ||X m - g dm (F m (Xi...,X fc , Z))|| 2 < 2 E ( ||X m || 2 + 
n n 



a<i„ 



[Fm (Xi, ...,Xfc, ZJ J 



<2E ||X, 



| 2 + Af 2 ) = 2K m , m + 2M 2 <oo, 



and (b) follows from similarly repeating (a) by taking one limit at a time. 
Now bounding the cross term (III) in ([33|, 



(39) 



1 



X m — X m mi X m — gd m (F m (X.i, ■ ■ ■ , Xfe, Z)) 



< 



< 



= 2 1-Pm E 



;2 1-P" 



X m — gd m (F m (Xi , ■ ■ • , Xfe,Z)) 



X m — grf m (F m (X 1 , • ■ ■ , X fc , Z)) 



(40) 



and using the bound on the term (II), implies that in limit this term is bounded as 2 1 Pm y/D m which vanishes. Hence we 
have proved that 



lim • • • lim — E 

pi— s-oo pk— >oo n 



X m — gd,„ (F m (Xi, • • • , Xfc, Z)) 



< 1e 

< D m 



X TO — gd m (F m (Xi , • • • ,Xfe,Z)) 



(41) 



Thus for any e > 0, we can choose p £ N k , with components large enough so C p achieves the distortion tuple, {Di + 
e, • • • , Z\. + e). What is left is to show we can dispense away with random encoders. This is argued in a standard manner by 
choosing the best randomizations V^'s at respective encoders, as done in 0. 



Appendix F 
Proof of Lemma[6] 

The proof of Lemma [6] follows similar steps to those in the proof of Lemma B] We start by noticing that, by definition, 
if the distortion tuple (D\, ■ ■ ■ , Dk) is achievable on an AWGN network, then we must have a coding scheme C with block 
length n, such that, 

- E[ II X m - X m || 2 1 < D m , V m = [1 : k]. (42) 

Using Lemma [2] without loss of generality we will suppose that 

\\gd J (yu---,yn)\\ OD < M, 

for each destination dj € T>. We will build a randomized coding scheme C p , for p = (pi, ..., p^) by defining the encoding 
function f 8 t at eacn source s m £ S as 

/ Sm , t (x m ,y Sm [o : t- 1]) = / Sm ,t(x m ,yW[o : t- 1]), 

and the encoding functions f i:t at each node i e 7^ U V as 

/ M (r j [0:t-1]) = / Sm , t (y/ P) [0:i-1]), 

where / Sm ,t and /$$ are the encoding functions of the original coding scheme C and Y t [t] is the effective received signal at 
node i at time t, obtained as 



Kp) 



W[ 



Yrit} = m}\ Pi +vrit} 

where V i [t] is an i.i.d. sequence of random variables drawn from (— 2~ Pi ~ 1 , 2~ Pi_1 ), independent of the transmit and receive 
signals in the network. 

Now let X = (Xi, ...,Xfe) be the vector of length nk with the k source sequences, and let Y be the random vector of 
length nN corresponding to all the received signals at all nodes during the n time steps in the block if code C is used. We 
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also write Y = (Y[0], ..., Y[n — 1]), where Y[i] = (Yi[£], ...,Y^[i\) is the random vector of received signals at all N nodes 
at time t, for < t < n — 1. Therefore, the conditional joint density of Y conditioned on X = x can be expressed as 



/v|x(y|x) = Y[ /Y[t]|Y[o],...,Y[t-i],x(y[*]|y[o],...,y[t- i],x) 



(43) 



Similarly, we let Y^ p ' be the vector of nN effective received signals if code C p is used instead. We also let Y^ p ' = 
(YM[0],...,Y<»[n- 1]V where Y^[t] = (Y} p) [i\, ..., YJf^t]). The conditional joint density of Y<» conditioned on X = x 
can be expressed as 



/ Y <p)|x(yl x ) = n ^y< 



p)[t]|Y(p)[0] : ...,Y<p)[t-l] 



iX (y[t]|y[0],...,y[t-l],x). 



(44) 



By applying Lemma [3] N times, for any choice of previously received signals y[0], ...,y[£ — 1] and source sequences x, we 
have that 

lim ••• lim /Y(P)miY(p)fol YtoHt-lLX (y[*]l y[°L -M* ~ ^x) 

pi— too ptv — »-oo L Jl L J ' ' L J ' 

=/Y[t]|Y[o],...,Y[t-i],x(yM|y[0],...,y[t- i],x), 



for almost all y[t]. Therefore, (43 1 and (44 1 imply that 



pi — ^oo pn — ^oo 



ra-1 

iim ••• lim TT / Y cp)ftiiY(p)foi Y(p)ft-ii,x(y[*]|y[°]>->y[*- 1 ]> x ) 

ii — yr>n r>j\r — ym ■*••*■ l Jl L"J> ' L J' 

t=o 

n-l 

: n /Y[t]|Y[o],...,Y[t-i],x(y[*]|y[0],...,y[i-i],x), 

t=0 



and, in particular, we can choose a sequence p[i] = (pi[i], ..., pn[i\), i = 1,2,..., such that 

ra-1 

/i™ n^Y( P w)[t]|Y(pw)[o],...,Y(pH)[t-i],x(yWly[°]'-'y[ i - i]**) 

00 t=0 
ra-1 

= J| M*]|Y[o],...,Y[t-i],x(y[*]|y[o],...,y[*- i],x), 



We conclude that / Y o»M)|x (yl x ) — ^ /Y|x(y| x ) as i — > oo for almost all y 6 R nJV and any x. By Scheffe's Theorem [7|, 
pointwise convergence of the density implies convergence in total variation. This, in turn, implies convergence in total variation 



of 



X m — <7d m (Y<i ; ) to || X m — 9d m 0^d m )\\ z as i — >• oo, which clearly implies that 



X m - ^ m (Y^ ]) )|p 4 ||X m - s dm (Y dm )|| 5 
From the Dominated Convergence Theorem (Appendix |H) , which can be used since 

X m -g dm (Y^)f 
<2Ef||X m || 2 + 9dm (Y^) 



<2E IIX 



\ 2 + M 2 



We conclude that 



lim — E 

i— >oo n 



9d m (Y 



(p\i)h 

d m - 



X = x 



|X m - 5dm (Y d J|| 



X = x 



for any fixed X = x, and for each decoder d m . Thus, the random variable E 



to E 



||X TO — 5frf m (Yd„ 



X 



Finally, by noticing that 



Xm-ftufYjf) 



X 



<2E IIX 



Xm-9 dm (Y d f ] : 



X +2M 2 



converges surely 
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and that 



IX, 



X 



•2Af 



— 2K, 



2M 1 



a second application of the Dominated Convergence Theorem implies that 



lim -E 

i— s-oo n 



X m - 9dm (Y^ ]) ) 



9dJY d „ 



= lim — E 

i— >oo n 



1 



X, 

x„ 



-9 dm (Y d t i]) ) 
~9d m (Y dm )\\ 2 



for each decoder d m . Thus we can choose large enough i so that performance of C p [j] will be arbitrarily close to that of C. 
What is left is to show we can dispense away with random encoders. This is argued in a standard manner by choosing the 
best randomizations Vj's at respective encoders. 

Appendix G 
Proof of Lemma[7] 

Denote the set ,5^{p) = {i £ R° : 2 p x <E Z a }, where Z is the set of integers. Note that the function in the theorem can take 
values f(y) where y <G ^(p). Now for each y <G ^(p), define the set S(y) = {x G K a : x ^ y, \x\ p = y}, which are disjoint 
for different values of y e ^(p) and cover the whole space M. a . Since / takes a constant value in each of the sets S(-), the 
only regions of discontinuity are the boundaries of these regions. But these boundaries are disjoint bounded rectangles each 
of which has Lebesgue measure zero, implying the total region of discontinuity has zero measure. Thus / is locally constant 
almost-everywhere (and hence continuous). 



Appendix H 
Dominated Convergence Theorem 

We require the following version of the Dominated Convergence Theorem. 

Theorem 6. Suppose we have a sequence of random vectors Z„ € R a converging weakly to Z, and two almost-everywhere 
continuous functions f,g : R a — > M. such that < / < g. Then, if E\g(Z n )] = E[g(7i)] = c < oo for all n, we have 
]im n ^ 00 E[f{Z n )] = E[f{Z)}. 



Proof: If we let X. 
functions, we have X n - 



/(Z„), Y n = g(Z n ), X — /(Z) and Y = g(Z), from the almost everywhere continuity of the 



->• X and Y n — s- Y. From Theorem 25.11 in [7|, we have that 



E[X] <riminf£:[X I1 



Note that, Y n — X n = g(Z n ) — /(Z n ) is an almost everywhere continuous function of Z n , hence the sequence of random 
variables Y n — X n , converges weakly to Y — X. Therefore, since Y n — X n > 0, a second application of Theorem 25.11 yields 

c - E[X] = E[Y -X]< liminf E[Y n - X n ] 

n— >co 

= liminfc— E[X n ] = c — lim sup ^[Xn]. 

n ^oo n — ^qq 

Combining both inequalities, we obtain 

limsup£[X„] < E[X] < liminf£;[X n ], 

which implies that lim n ^oo E[X n ] = £"[X]. ■ 



